equal treatment
Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas
Chiu, Yu Ying, Wang, Zhilin, Maiya, Sharan, Choi, Yejin, Fish, Kyle, Levine, Sydney, Hubinger, Evan
Detecting AI risks becomes more challenging as stronger models emerge and find novel methods such as Alignment Faking to circumvent these detection attempts. Inspired by how risky behaviors in humans (i.e., illegal activities that may hurt others) are sometimes guided by strongly-held values, we believe that identifying values within AI models can be an early warning system for AI's risky behaviors. We create LitmusValues, an evaluation pipeline to reveal AI models' priorities on a range of AI value classes. Then, we collect AIRiskDilemmas, a diverse collection of dilemmas that pit values against one another in scenarios relevant to AI safety risks such as Power Seeking. By measuring an AI model's value prioritization using its aggregate choices, we obtain a self-consistent set of predicted value priorities that uncover potential risks. We show that values in LitmusValues (including seemingly innocuous ones like Care) can predict for both seen risky behaviors in AIRiskDilemmas and unseen risky behaviors in HarmBench.
The Neutrality Fallacy: When Algorithmic Fairness Interventions are (Not) Positive Action
Weerts, Hilde, Xenidis, Raphaรซle, Tarissan, Fabien, Olsen, Henrik Palmer, Pechenizkiy, Mykola
Various metrics and interventions have been developed to identify and mitigate unfair outputs of machine learning systems. While individuals and organizations have an obligation to avoid discrimination, the use of fairness-aware machine learning interventions has also been described as amounting to 'algorithmic positive action' under European Union (EU) non-discrimination law. As the Court of Justice of the European Union has been strict when it comes to assessing the lawfulness of positive action, this would impose a significant legal burden on those wishing to implement fair-ml interventions. In this paper, we propose that algorithmic fairness interventions often should be interpreted as a means to prevent discrimination, rather than a measure of positive action. Specifically, we suggest that this category mistake can often be attributed to neutrality fallacies: faulty assumptions regarding the neutrality of fairness-aware algorithmic decision-making. Our findings raise the question of whether a negative obligation to refrain from discrimination is sufficient in the context of algorithmic decision-making. Consequently, we suggest moving away from a duty to 'not do harm' towards a positive obligation to actively 'do no harm' as a more adequate framework for algorithmic decision-making and fair ml-interventions.
Kantian Deontology Meets AI Alignment: Towards Morally Robust Fairness Metrics
Deontological ethics, specifically understood through Immanuel Kant, provides a moral framework that emphasizes the importance of duties and principles, rather than the consequences of action. Understanding that despite the prominence of deontology, it is currently an overlooked approach in fairness metrics, this paper explores the compatibility of a Kantian deontological framework in fairness metrics, part of the AI alignment field. We revisit Kant's critique of utilitarianism, which is the primary approach in AI fairness metrics and argue that fairness principles should align with the Kantian deontological framework. By integrating Kantian ethics into AI alignment, we not only bring in a widely-accepted prominent moral theory but also strive for a more morally grounded AI landscape that better balances outcomes and procedures in pursuit of fairness and justice.
Beyond Demographic Parity: Redefining Equal Treatment
Mougan, Carlos, State, Laura, Ferrara, Antonio, Ruggieri, Salvatore, Staab, Steffen
Liberalism-oriented political philosophy reasons that all individuals should be treated equally independently of their protected characteristics. Related work in machine learning has translated the concept of \emph{equal treatment} into terms of \emph{equal outcome} and measured it as \emph{demographic parity} (also called \emph{statistical parity}). Our analysis reveals that the two concepts of equal outcome and equal treatment diverge; therefore, demographic parity does not faithfully represent the notion of \emph{equal treatment}. We propose a new formalization for equal treatment by (i) considering the influence of feature values on predictions, such as computed by Shapley values decomposing predictions across its features, (ii) defining distributions of explanations, and (iii) comparing explanation distributions between populations with different protected characteristics. We show the theoretical properties of our notion of equal treatment and devise a classifier two-sample test based on the AUC of an equal treatment inspector. We study our formalization of equal treatment on synthetic and natural data. We release \texttt{explanationspace}, an open-source Python package with methods and tutorials.
Should Bank Stress Tests Be Fair?
Regulatory stress tests have become one of the main tools for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
Closed-Loop View of the Regulation of AI: Equal Impact across Repeated Interactions
Zhou, Quan, Ghosh, Ramen, Shorten, Robert, Marecek, Jakub
There has been considerable interest in the regulation of artificial intelligence (AI), recently. It is increasingly recognized that so-called high-risk applications of AI, such as in Human Resources, Retail Banking, or within public schools, be it admissions or assessment, cannot be served by black-box AI systems with no human control. It is not clear [10], however, how to phrase even the desiderata for the regulation of AI. Here, we suggest that the desiderata could be the same as in the Civil Rights Act of 1964 and much of the subsequent civil-right legislation world-wide: equal treatment and equal impact. At the same time, we point out that these desiderata could be in conflict [34]. Let us illustrate the conflict on an example of a system that performs credit-risk estimate in a consumer-credit company.
AI Fairness: from Principles to Practice
Bateni, Arash, Chan, Matthew C., Eitel-Porter, Ray
This paper summarizes and evaluates various approaches, methods, and techniques for pursuing fairness in artificial intelligence (AI) systems. It examines the merits and shortcomings of these measures and proposes practical guidelines for defining, measuring, and preventing bias in AI. In particular, it cautions against some of the simplistic, yet common, methods for evaluating bias in AI systems, and offers more sophisticated and effective alternatives. The paper also addresses widespread controversies and confusions in the field by providing a common language among different stakeholders of high-impact AI systems. It describes various trade-offs involving AI fairness, and provides practical recommendations for balancing them. It offers techniques for evaluating the costs and benefits of fairness targets, and defines the role of human judgment in setting these targets. This paper provides discussions and guidelines for AI practitioners, organization leaders, and policymakers, as well as various links to additional materials for a more technical audience. Numerous real-world examples are provided to clarify the concepts, challenges, and recommendations from a practical perspective.
Discrimination in machine learning algorithms
Pappadร , Roberta, Pauli, Francesco
A human may discriminate either because of irrational prejudice induced by ignorance and stereotypes or based on statistical generalization: lacking specific information on an individual, he is assigned the characteristics prevalent in the sensitive attribute category he belongs to. For example, in the United States, lacking information on education, a black person may be assumed to have relatively low level since this is the case in general for black people in the country) [7]. When a statistical or machine learning algorithm is used in the decision process, its behavior concerning discrimination depends on the information it is given. In particular, if the sensitive attribute is available to the algorithm (i.e., it is included in the learning data and can be used for predictions), it may discriminate either because the data it is taught contain irrational prejudice (Figure 1(a)) or because the sensitive attribute is associated to an unobserved attribute that is relevant for the prediction of Y, the outcome of interest (Figure 1(b)).
AI Taking A Knee: Action To Improve Equal Treatment Under The Law
In the wake of the George Floyd tragedy and so many other appalling cases like it, there is a growing question if a solution lies with robot police powered by artificial intelligence (AI.) In theory, AI cops could reduce biased and discriminatory practices and improve access to justice. Pop culture is filled with heroes like this such as Robocop and CHAPPiE. However, reality maybe a little stranger than fiction in this case as there are already some robots already in action for law enforcement. Let's start with Robo-Guard, which works in the South Korean prison system.